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Natural Language Processing for Expertise Modelling 
in E-mail Communication" 



AUtruct. Oxlo way to fij*d tha information that may be required, i& W> wppn?"^ 
a person who is believed to pa*$6$$ it en- to id&rtfify a person wfco knows where 
lO Toole for iL Technical support, which automatically compiles individual 
expertise anti makes ihis accessi ble, may be rymrrad on an expert finder system. 
A ceaiiral czHuponenr of sack a system is a user profile, which describes user 
expertise level in discussed subjects. Previous works have made attempts to 
weight user expertise by using content-based methods, which associate the 
expertise Icvdj^lhjhe.&rialyi^ mrapectrve.of .arry.sraanric. 
meanings conveyed This paper explores the idea of using a natural language 
processing* te chniq ue to understand given icLfoonadon from both a structural and 
semantic p&t&pecdvei in building user profiles. "With its improved interpretation 
capability compared to' prior works, it aims to enhance the performance 
accuracy in ranking the order of names of experts, returned by a system against 
a help-seeldrjg query. To demonstrate its efficiency* e-mail communication is 
chosen as an application domain, since its closeness in a spoken dialog, makes 
it p ossib le to focus on (he linguistic arm^tesjafuser mformarion in the process 
Of exr^rlS&nmodelling.* l£sperlmCT~ta1 results from a case smdy^show a* 23<£ 
higher pcxfbxm&noe On average over 77% of the queries tested with the 
approach presented here. 

1 Introduction 

A crucial task in the distributed environments that most organizations operate is to 
effectively manage the usefbi knowledge held by individuals. Not only does (his 
supplement additional resources, but it also • contributes timely and up-to-date 
procedural and factual knowledge to enterprises* la Order to fully maximize 
individually held resources, it is necessary *o encourage people to share such valuable 
data. As their expertise is accumulated through task achievement, it is also important 
to exploit it as it is created. Such an approach allows individuals to work as normal 
without demanding changes in working environments [7 J. 

An expert finder is a system designed to locate people who have Sought-after 
knowledge 1 to solve a specific problem [4]. It answers with the names of potential 
helpers against knowledge seeking queries, in order to establish per&OnaJ contacts 
.which link novices to experts. The ultimate goal of such a system is to create 



* This work was funded by the University Technology Partnership QJTP) Tor Design, which is a 
collabor«ion between kolls-Royce* BaE Systems and the Universities of Cambridge, 
Sheffield and Southampton. 
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environments where users are aware of each other, maximizing their current resources 
and actively exchanging up-to-date mforrnarion. Although the expert finder sy&cemS 
cannot always generate correct answers, bringing the relevant people Together 
provides opportunities for them to become aware of each other, and to have further 
discussions, which may uncover hidden expertise. 

In designing technical support to maximize the use of such personal expertise, two 
issues have to be addressed; I) how to simulate die personal contacts found m real 
environments, and 2) how to capture personal expertise while allowing users to work 
as they normally do without demanding changes in working environments- The 
exploitation of e-mail commuriicadon, which can be enhanced as a communication- 
based learning tool, where individual experiences are shared among communicators, 
is proposed as an enabling technology, B-mail communication has become a major 
means of exchanging information and acquiring social or orgtmisational relauonships, 
irnpiyine that it would be a good source of Intbrmarian about recent and useful 
cooperative activities among users. It is hypothesized that because of us popularity, 
information mined from e-mail communication can be considered as mformauon 
from expertis© discovery sources [3; 7]. In addition* as it represents an every day 
activity, it requires no major changes to working environments, which makes it 
suitable as a test environment - , 

A decision about whether an Individual is an expert for a given problem may be 
made by consulting user profiles. Drawn from information retrieval studies, the 
frequencies of keywords have been .extensively used for extracting user informaaon 
from exchanged e-mail messages. However, there are at least three reasons why such 
an approach is inadequate when applied to expertise modelling. First, counting 
keywords is not adequate for determinin g whether a given document is factual 
mforrnation or contains some level of author , expertise. Secondly, without 
understanding the semantic meanings of keywords, it is possible 10 assume that 
different words repxesenl the same concept and vice versa, which triggers the retrieval 
of non-relevant information. Finally, it is not easy to distinguish question-type texts 
from potential answer documents, which support retrieval of the relevant documents 
for the given query. In addition, the argument that user expertise is action-centred 
and is often distributed in the mdividual^ actionHsxperiedces, is the motivation behind 
mis work that relies on Ifngmstic-oriented user modelling [2]. With this approach, 
when we regard the given messages as the realization of involved knowledge,, user 
expertise can be verbalized as a direct indication of user Views on discussed subjects, 
and the levels of expertise are distinguished by taking into account the degree of 
significance of the words employed in the messages. 

la this paper, a new expertise model, EMJSOLP (Expertise Modelling using Natural 
Langua-e Processing) that captures the different levels of expertise reflected m 
exchanged e-mail messages, and makes use of such expertise in facilitating a correct 
ranking pf experts, is presented. It examines the application of NIP (Natural 
Language Processing) techniques and user modelling to the development of an expert 
finder system based on e-mail eoirjniunication, The creation of an expert finder 
system that promptly accQimnodattS new information is one Qf the two main themes 
of this paper, and while improving its competency values by using NLP for profiling 
of users is the second. 
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2 Related Work 

KnowIedgeMail from Tack Corp is the system most related to EMNLP, in that rt adds 
an automatic profiling ability to some of existing commercial e-mail systems, to 
support information sharing through executing queries about the profiles constructed 
[7J. User profiles are formulated as a list of weight- valued terms by using one 
statistical method. A survey focusing on the systems performance reveals that users 
tend to spend extra time cleaning up their profiles in order to reduce false hits, which 
erroneously recommend them as experts due to unresolved ambiguous terms [3]. In an 
effort to reduce such rrroWems, the application of NLP to profiling users Is sugsested- 

AS a consequence, EMNLP is expected tp generate more meanfngfb] terms in user 
profiles- 

Maybury et al- [4] developed an expert finder system that exploits the mtcllcctual 
products created within an organisarion to support automated expertise identification, 
The system considered a user as an expert if he/she was linked to a wide range Df 
documents and/or a large number of documents about that topic. It combines multiple 
evidence demonstrating associations wiih. ihe user m determining- the level of 
expertise of the user. This is comparable to EMNLP in mat it qualifies experts by 
requiring detailed evidence, however,, it difFers in mat such evidence is collected from 
the measurement of information usage patterns, rather than from the analysis of the 
meanings and functional roles of such information. 

Based on the Java Programming domain, the system described by Vrvacqua et al- 
[9] model a user's programming stall by reading source code files, and analysing what 
classes, libraries or methods are used and how often. This result is then compared to 
the overall usage for the remaining users, to debaminc me levels of expertise for 
specific topics (e-g-, methods). Its automatic profiling and mapping of five levels of 
expertise (i'.e., expert-advanced *° &osc 

EMNLP. However, the expertise assignment function is rather too simplified in so far 
as it disregards various coding patterns that might reveal the different skills of experts 
and beginners. 

3 Descriptions of EMNLP 

A design objective of EMNLP is to improve the efficiency of the task search, which 
ranks peoples* names in decreasing order of expertise against a help-seeking query. Its 
contribution is to turn once simply archived e-mail messages into knowledge 
xeposiiories by ar^aching them from a linguistic perspective, which regards die 
exchanged messages as the realization of verbal communication among users. Its 
supporting assumption is mat user expertise, is best extracted by focusing on the 
sentence where users' viewpoints arc explicitly expressed, NLP is identified as an 
enabling technology that analyzes e-mail messages wiua two aims; 1) to classify 
sentences into syntactical structures (syntactic analysis), and 2) to extract users' 
expertise levels using the functional roles of given sentences (semantic interpretaiion). 
Figure 1 shows the procedure for using EMNLP, i.e. how to create user profiles from 
the collected messages. Further details of the NLP components arc explained with ihe 
dotted line. Contents are decomposed into a set of paragraphs and heuristics (e.g.. 
locating a full stop) are applied in order to break down each paragraph into sentences. 
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Fig.l: The procedure for using an NLP-based user profiling 

Syntactical analysis Identifies the syntactic roles of words in a scniedce by using a 
carpus annotation [1]- Apple Mfc Parser, which is a bottom-up probabilistic chart 
parser and finds the parse tree Willi the best score by the best-first search algorithm. is 
used for this purpose \6l. The syntactical analysis supports the location of a mam verb 
in a sentence, by decomposing the sentence into a group of grammatically related 
phrases, suchas "nowi\ "adverb", "adjective", "verb", or w prcposixion\ 

Given the structural information about each sentence, semantic: analysis examines 
sentences vwm~rwo^nteHa:~ I)" wBctfier"the employed verb verbalizes the speaker^ 
attitudes, and 2) whether the sentence has a "first person" (e-fc. "I", m y opinion j 
or •We") subject This analysis is based on Speech Act Theory <SAT), which 
proposes that communication involves the speaker's expression of an attitude (i.e- an 
fllocudonary act) towards ihe contents of the commutdcation [SJ- It suggests that 
information can be delivered with different communication effects on recipients 
depending on different speakers attitudes, which arc expressed using an appropriate 
illocutionary act.- which represents a particular ^function- of- cornmumcation. - Tbe 
performance of ihe speech act is described by a verb, which posies a core element as 
the central organizer of the sentence. In addition, the fact thai working practices are 
reflected through task acWevemcnx implies that personal expertise can be regarded as 
action-denied, emphasizing the important role of a "first person" subject in expertise 

modelling PI- „ 
EMNLF extracts user expertise -from the sentences, which have first person 
subjects and determines expertise levels based on (he identified mam verbs. Whereas 
SAT reasons about how different Ulocutionary verbs convey the various intentions of 
speakers, OT-P determines the intention by mapping the central verb in the sentence to 
ihe pre-defined illocutionary verb. The decision about the level of user expertise is 
made according to the defined hierarchies of the verbs, initially provided by SAT. 
SAT provides the categories of illocutionary verbs (I.e. assertive, commissive, 
directive, declarative, and expressive), each of which contains a set of exemplary 
verbs. EMNLP further extends the hierarchy in order to increase its coverage for 
practicability by using the WordNet Database [5]. EMNLP first examines all verbs 
occurring in the collected messages, and then filters out verbs, which have not been 
mapped onto the hierarchy. For each verb, ft consults the WordNet database m order 
to assign a value through chaining its synonyms; for example, if the synonym of the 
given verb is classified mio 'assertive" value, and then this verb is also assigned into 
"assertive 1 '. 
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To clarify how two sentences* which may b© assumed to contain similar 
keywords, Are mapped onto different profiles* consider two example sentences: 1) 
"For the 5049 testing, phase analysis an those high frequency results that Rob plotted 
is needed' 1 , and 2) "For the 5049 testing, I lennw we need phase analysis on those high 
frequency results that Rob 'plotted". The main verb values For both sentences (Le. T 
need and know) are equivalent to "Strong Working Knowledge", which conveys a 
relatively high knowledge for a speaker. However, the difference Is that when 
compared Id rhe first, the second sentence clearly conveys the speakers intention as it 
begins with "I know". As a consequence, it is regarded as demonstrating expertise 
while the first sentence is not- Information extracted from the first sentence is mapped 
onto a lower-level expertise. 

4 Experimental Results 

A case study has been developed. to test two hypotheses; namely 1) that EMNUP 
produces comparable Or higficr accuracy in cjffeentiatirig expertise from factual 
mformation compared to that of the frequency-based statistical model, 2) that 
differentiating expertise from factual information supports more effective query 
processing in locating the right experts. As a baseline, a frequency-based statistical 
mode], -which builds user profiles by weighting presented terms without considering 
their meanings or purposes was used- 

A total of 10 users, who work for the same department in a professional 
engineering design company, participated in the experiment and a period of three-to 
four months duration was spent collecting e-mail messages. A total of 18 queries were 
created for a testing dataset, and a maximum number of 40 names of predicted 
experts. Le. 20 names extracted using EMNLP and 20 names from the statistical 
model, were , shown to a user,- who was the group leader of the other users.- As a 
manager, the user was able to evaluate the retrieved names according us the five pre- 
defined expertise levels: "Expert-Level Knowledge". "Strong Working Knowledge", 
"Workiog Knowledge", "Strong Working Interests", and ,r Worldrjg Interests"- 
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Figure 2 summarizes lh* results measured by normalised precision- For 4 questions 
(Le. 4.12,14,18), EMNLP produced lower performance rates than by wing the 
statistical approach. However, for 14 queries, its ranking results were more accurate, 
and at the highest point, it outperformed the statistical method with a 35% higher 
precision value. The precision-recall curve, which demonstrates a 23% higher 
precision value for EMNLP, is shown in Figure 3. The difrerences of precision values 
at different recall thresholds are raiher small with EMNLP, implying that its precision 
values are relatively higher than those of the statistical model 

A close examination of the queries used for testing reveals mat the statistical model 
has a better capability in processing general-type queries that search "for non-Specific 
factual information, since 1) as we regard user expertise as action-oriented, 
knowledge is distinguished from such factual information,, implying thai it is difficult 
to value factual information as knowledge with EMNLP, and 2) EMNLP is limited 10 
exploring various ways of determining the level of expertise in that it constrains user 
expertise to be expressed through the first person in a Sentence* 

5* Future Work 

EMNLP was developed to improve the accuracy of ranking the order of expert 
names by use of the NLP technique to capture explicitly stared user expertise, which 
otherwise may be ignored. Its improved ranking order, compared to that of a 
statistical method, was mainly due to the use of an enriched expertise acquisition 
technique, which successfully distinguished experienced users from novices- We 
presume that EMNLP would be particularly useful when applied to large 
Organizations where it is vital to improve retrieval performance since typical queries 
may be answered with a list of a few hundred potential expert names. 

Special attention is given to garnering domain, specific, terminologies possibly 
collected from technical documents such as task manuals or memos. This is 
particularly useful for the semantic analysis, which identifies concepts and 
relationships wiihin The NLP framework, since these terminologies arc not retrievable 
from general-purpose dictionaries (e g- the WordNet database). 
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